An Empirical Study of Multilingual Spoken Term Detection

نویسندگان

  • Zejun Ma
  • Xiaorui Wang
  • Bo Xu
چکیده

This paper introduces the design of multilingual spoken term detection (STD) system using CALLHOME and CALLFRIEND multilingual databases published by Linguistic Data Consortium. For our experiments seven languages namely Arabic, English, German, Japanese, Korean, Chinese Mandarin and Spanish, are used to train and evaluate the STD system. As the core module of our language general STD system, the multilingual automatic speech recogniser combines the acoustic and language models of seven languages into an uniform model set. A lot of our works are focused on the comparison of multilingual acoustic models the conventional global phoneme set (GPS) based method and the recently proposed subspace GMM (SGMM) method [1] are investigated in detail. The experimental results demonstrate the viability of our multilingual STD system. It is shown that the resulting multilingual system not only supports seven different languages but also gives satisfying performance gains over the monolingual systems.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An empirical study of multilingual and low-resource spoken term detection using deep neural networks

As a further step of our previous work, this paper focuses on how to promote the multilingual spoken term detection (STD) system by the use of shared-hidden-layer multilingual DNN (SHL-MDNN). Seven languages namely Arabic, English, German, Japanese, Korean, Mandarin and Spanish are used in our experiments. Compared with our original multilingual STD system, which is based on Subspace GMMs (SGMM...

متن کامل

SpeeD @ MediaEval 2014: Spoken Term Detection with Robust Multilingual Phone Recognition

In this paper, we attempt to resolve the Spoken Term Detection (STD) problem for under-resourced languages by phone recognition with a multilingual acoustic model of three languages (Albanian, English and Romanian). The Power Normalized Cepstral Coefficients (PNCC) features are used for improved robustness to noise.

متن کامل

English spoken term detection in multilingual recordings

This paper investigates the automatic detection of English spoken terms in a multi-language scenario over real lecture recordings. Spoken Term Detection (STD) is based on an LVCSR where the output is represented in the form of word lattices. The lattices are then used to search the required terms. Processed lectures are mainly composed of English, French and Italian recordings where the languag...

متن کامل

Code-switched English Pronunciation Modeling for Swahili Spoken Term Detection

We investigate modeling strategies for English code-switched words as found in a Swahili spoken term detection system. Code switching, where speakers switch language in a conversation, occurs frequently in multilingual environments, and typically deteriorates STD performance. Analysis is performed in the context of the IARPA Babel program which focuses on rapid STD system development for under-...

متن کامل

SpeeD @ MediaEval 2015: Multilingual Phone Recognition Approach to Query by Example STD

In this paper, we attempt to solve the Spoken Term Detection (STD) problem for under-resourced languages by a phone recognition approach within the Automatic Speech Recognition (ASR) paradigm, with multilingual acoustic models from six languages (Albanian, Czech, English, Hungarian, Romanian and Russian). The Power Normalized Cepstral Coefficients (PNCC) features are used for improved robustnes...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011